How did the rise in technology usage (social media, internet, etc..) affect historically successful businesses/industries?

library(tidyverse)
library(dplyr)
library(ggplot2)
library(forcats)
library(lubridate)
library(sparklyr)
library(ggthemes)
library(gridExtra)
library(readxl)
library(stringr)
library(corrr)
library(here)
library(corrgram)
library(readr)
library(ggpubr)
library(data.table)
setwd("/Users/franklinglance/Library/CloudStorage/OneDrive-UniversityofVirginia/UVA/Spring 2022/SYS 2202/Final Project/Fortune_500_Analysis")

Importing Data

fortune1955_2021 <- read_csv("data/fortune500_1955_2021.csv")  # rank + revenue data for fortune 500 companies
percent_using_internet <- read_excel("data/percent_of_population_using_internet.xlsx")
technology_adoption_by_households_in_the_united_states <- read_csv("data/technology-adoption-by-households-in-the-united-states.csv")

computing_efficiency <- read_csv("data/computing-efficiency.csv")
# data on various business indicators (see metadata)
data_bisiness_indicators <- read_csv("data/data_business_indicators/aee336d2-021b-4021-912c-3de7fd9d2729_Data.csv")

# data on supercomputer power
supercomputer_power_flops <- read_csv("data/supercomputer-power-flops.csv")

# data on phone subscriptions
landline_cellular_data <- read_excel("data/landline_cellular_data.xlsx")

# company info
company_info <- read_csv("data/Companies.csv")

# massive company dataset (7 million companies)
company_infov2 <- read_csv("data/companies_sorted.csv")

# excel sheet I created for breaking down industries into stock market sectors
industries_to_sector <- read_excel("data/industries_to_sector.xlsx")

# moores law
moores_law <- read_csv("data/transistors-per-microprocessor.csv")

# historical data on computer memory cost 
computer_storage_cost <- read_csv("data/historical-cost-of-computer-memory-and-storage.csv")
Dataset Information
Dataset Source Description Usage
fortune_1955_2021

Analysis Overview

  1. Prepare Dataset Representing Fortune 500 Companies
  2. First round analysis on Fortune 500 dataset.
  3. Clarify Assumptions/Goals
  4. Quantify Success
  5. Analyze relationship between technological factors and success
  6. Summarize Results

Dataset Summary

Fortune 500 Data

Technological Factors Data

Part 1: Preparing Fortune 500 Dataset

Prepare Fortune 1000 and Fortune1995_2021 for joining + join by company

# add "Company" column in time scale data set on fortune 500.
fortune1955_2021 <- fortune1955_2021 %>%
  mutate(Company = Name)
# create company info dataset, holds information on 7 million companies, used to fill out missing information on time-scale data
company_infov2 <- company_infov2 %>% 
  rename(Company = name)
# make names lowercase
lowercase_companies_1955_2021 <- fortune1955_2021 %>%
    mutate(Company = tolower(Company))
fortune_joined_v2 <- lowercase_companies_1955_2021 %>%
  left_join(company_infov2, by = "Company") 
# fixing data types
#fortune_joined_v2 <- fortune_joined_v2 %>%
 # mutate(Revenue = as.numeric(Revenue))

Tidying Fortune Joined Dataset

Change revenue to be in billions of dollars

# changing revenue to numeric type, now in billions of dollars
fortune_joined_v2 <- fortune_joined_v2 %>%
  transform(Revenue = as.integer(Revenue)) %>%
  mutate(Revenue = Revenue/1000) %>%
  transform(Revenue = as.integer(Revenue))

Part 2: First Round Analysis on Fortune 500 Dataset

What Constitutes Success?

Can rank and Revenue be used synonomously when talking about success?

It appears that there is absolutely a correlation between rank and revenue, so we can use either rank or revenue to quantify success. This will be useful in the future as it is easier to bin a ranking range rather than having to make buckets for all revenues.

fortune_joined_v2 %>%
  filter(Year > 1985) %>%
  group_by(Rank) %>%
  summarise(mean_revenue_by_rank = mean(Revenue, na.rm = TRUE)) %>%
  ggplot(aes(Rank, mean_revenue_by_rank)) +
  geom_smooth() + 
  labs(title = "Mean Revenue by Rank, Fortune 500 from 1985 to 2021", 
       subtitle = "Displays correlation between rank and revenue") +
  ylab("Mean Revenue (in billions)") + 
  scale_x_continuous(trans = "reverse")

The logarithmic curve provides an even better picture of the relationship between rank and revenue.

fortune_joined_v2 %>%
  filter(Year > 1985) %>%
  group_by(Rank) %>%
  summarise(mean_revenue_by_rank = mean(Revenue, na.rm = TRUE)) %>%
  ggplot(aes(Rank, log10(mean_revenue_by_rank))) +
  geom_smooth() + 
  labs(title = "Mean Revenue by Rank, Fortune 500 from 1985 to 2021, Log curve", 
       subtitle = "Displays correlation between rank and revenue") +
  ylab("Mean Revenue (in billions), logarithmic") + 
  stat_cor()  +
  scale_x_continuous(trans = "reverse")

Recoding Fortune 500 Data

Initially, I thought that I could group by industry, however there are clearly too many categories (126 to be exact).

fortune_joined_v2 %>%
  group_by(industry) %>%
  summarise(unique(industry))
## # A tibble: 126 × 2
##    industry                `unique(industry)`     
##    <chr>                   <chr>                  
##  1 accounting              accounting             
##  2 airlines/aviation       airlines/aviation      
##  3 apparel & fashion       apparel & fashion      
##  4 architecture & planning architecture & planning
##  5 arts and crafts         arts and crafts        
##  6 automotive              automotive             
##  7 aviation & aerospace    aviation & aerospace   
##  8 banking                 banking                
##  9 biotechnology           biotechnology          
## 10 broadcast media         broadcast media        
## # … with 116 more rows

Since there are 126 unique industries, it is necessary to recode the Fortune 500 dataset to have a larger overarching category with which to analyze trends in the companies. It makes the most sense to use the 11 sectors of the stock market for grouping. The stocks of companies within the same industry will typically trade in the same direction, providing a good reference to the industry as a whole. Additionally, companies in the same industry are often affected by the same (or similar) factors.

Reference for Sector Breakdown: https://time.com/nextadvisor/investing/stock-market-sectors/

The Sectors Are: Energy Materials Industrials Utilities Healthcare Financials Consumer Discretionary Consumer Staples Information Technology Communication Services Real Estate

I created an excel spreadsheet mapping the current “industries” to stock market sectors.

# 126 unique industries
industries<- fortune_joined_v2 %>%
  summarise(unique(industry))
# export to csv
write.csv(industries, "industries.csv")
# use industries_to_sector dataframe to recode industry column
industries_to_sector <- industries_to_sector %>%
  mutate(industry = Industry)
# fortune joined now contains sector info
fortune_joined_v3 <- fortune_joined_v2 %>% 
  left_join(industries_to_sector, by = "industry")
fortune_joined_v3 <- fortune_joined_v3 %>%
  filter(Sector != "NA")
# fixing data types
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(Year = as.numeric(Year),
         Revenue = as.numeric(Revenue))

This process produces fortune_joined_v3, the primary dataset for the rest of the analysis.

Visualizing + Analyzing Fortune 500 after Sector Grouping

Which companies have been consistently on the Fortune 500?

First, we will list companies which have been on the fortune 500 for more than 20 years, after 1985. This will give us an idea of how many companies stay on the fortune 500 for an extended period of time.

Results: 317 companies have been on the Fortune 500 for at least 20 years, since 1985. This is over half of the Fortune 500.

head(fortune_joined_v3 %>% count(Company, sort=TRUE), n=20)
##                 Company   n
## 1                   ibm 469
## 2                   cbs 360
## 3                harris 288
## 4                mosaic 272
## 5                 merck 268
## 6       hewlett-packard 256
## 7  state farm insurance 243
## 8                   fmc 224
## 9                   itt 216
## 10               singer 204
## 11  international paper 201
## 12     northrop grumman 201
## 13              pepsico 201
## 14  united technologies 195
## 15                  trw 177
## 16         baker hughes 176
## 17                eaton 174
## 18                  amp 170
## 19              tenneco 168
## 20            microsoft 162
# filtering by companies on the list for 20 years, since 1985
output <- fortune_joined_v3 %>%  # hiding output for knit
  group_by(Company) %>%
  filter(Year > 1985) %>%
  filter(n() > 20) %>%
  summarise(unique(Company), n()) %>%
  arrange(desc(n()))

Now we will filter for companies who have been in the Fortune 500 the entire time.

Results: 168 companies have been on the fortune 500 for the entire time (1985-2021). The 168 companies who have been on the Fortune 500 would typically be large corporations, who have been around for awhile. These companies are very successful, but may be less innovative due to their constant market presence, which requires that they please shareholders. This often results in companies sticking to their tried and true methods, rather than taking the innovative leaps required to create a unicorn startup.

output2 <- fortune_joined_v3 %>%  # hiding output for knit
  group_by(Company) %>%
  filter(Year > 1985) %>%
  filter(n() > 35) %>%
  summarise(unique(Company))

I will visualize the two different subsets of companies in the next section.

Boolean Select Columns

Next, Boolean Select columns are created within fortune_joined_v3 holding a Boolean value representing presence in several different subsets of the Fortune 500:

Subset Description
Incumbent Companies that have been in the Fortune 500 every year since 1985.
20+ Companies that have been in the Fortune 500 for at least 20 years since 1985.
IT Companies within the Information Technology Sector
Startup Companies that were founded after 2000, and are in the Fortune 500 in 2021.
#incumbent column
fortune_joined_v3 <- fortune_joined_v3 %>% 
  group_by(Company) %>%
  filter(Year > 1985) %>%
  mutate(incumbent = ifelse(n()>35, "True", "False"))
# twenty year column
fortune_joined_v3 <- fortune_joined_v3 %>% 
  group_by(Company) %>%
  filter(Year > 1985) %>%
  mutate(on_for_twenty = ifelse(n()>20, "True", "False"))
# information technology
fortune_joined_v3 <- fortune_joined_v3 %>% 
  group_by(Company) %>%
  mutate(IT = ifelse(Sector == "Information Technology", "True", "False"))
# current fortune 500 companies
current <- fortune_joined_v3 %>%
  filter(Year == 2021)
# adding column holding boolean value
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(current_fortune = ifelse(Company %in% current$Company, "True", "False")) 
# adding startup
startups <- fortune_joined_v3 %>%
  group_by(Company) %>%
  filter(current_fortune == "True") %>%
  filter(year.founded > 1999)
# adding startup column
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(startup = ifelse(Company %in% startups$Company, "True", "False")) 

Visualizing New Subsets of the Data

Now that we have a way of narrowing in on subsets of the fortune 500 (incumbent, on_for_twenty, IT, current_fortune, startup), it is possible to visualize certain parts of data to learn more about how the Fortune 500 companies have performed over the years.

All Companies:

Average Revenue of Fortune 500 Companies from 1985 to 2021

This visualization shows the distribution of average revenue for a fortune 500 company in each sector from 1985 to 2021. It will serve as a baseline for the analysis.

Interpretation:

  • The Consumer Staples, Information Technology, and Healthcare sectors appear to have the most continuous and largest amounts of growth. This indicates that it may be more likely for a company to be successful if it is in one of these three sectors.

  • Real Estate, Utilities, and Energy Sectors have not shown promising growth in recent years

fortune_joined_v3 %>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_col() +
  facet_wrap(~ Sector) +
  labs(title = "Average Revenue of Fortune 500 Companies from 1985 to 2021", 
       subtitle = "Displays distribution of Revenue, by sector") +
  ylab("Revenue (in billions)")

Incumbent Companies:

Average Revenue for Incumbent Fortune 500 Companies

Incumbent - Companies that have been on the Fortune 500 every year since 1985

Interpretation:

  • Incumbent companies have a greater average revenue than the Fortune 500 as a whole. This indicates that Incumbents are more successful on average than companies who haven’t been on the list the entire time. This make sense as I would expect for the incumbents to be in the top half of the Fortune 500 in order to not be eliminate on a down year.

  • Consumer Staples and Information look strong as ever

  • Healthcare appears to be a more recent addition to the Fortune 500, since the revenue of Healthcare companies has grown significantly more in non-incumbent companies. This suggests that new technologies could have lead to an increase in revenue in that sector. This point will be pursued further in part tk.

fortune_joined_v3 %>%
  filter(incumbent == "True")%>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_col() +
  facet_wrap(~ Sector) +
  labs(title = "Average Revenue for Incumbent Fortune 500 Companies from 1985 to 2021", 
       subtitle = "Displays distribution of Revenue, by sector") +
  ylab("Revenue (in billions)")

20 Year Companies:

Average Revenue for 20+ Fortune 500 Companies

20+ Definition - Companies that have been on the Fortune 500 for at least 20 of the 36 years since 1985.

Interpretation:

  • Consumer Staples and Information Technology look strong as ever

  • The financial sector appears to hold steady, almost non-decreasing growth. This suggests that the successful financial companies are likely to join the fortune 500, and stay on once they do.

  • Healthcare is very strong in this subset of the Fortune 500 companies. Again, this suggests that the healthcare sector has experienced solid growth in recent years.

fortune_joined_v3
## # A tibble: 19,539 × 22
## # Groups:   Company [976]
##     Year Name    Revenue  Rank Company    ...1 domain      year.founded industry
##    <dbl> <chr>     <dbl> <dbl> <chr>     <dbl> <chr>              <dbl> <chr>   
##  1  2021 Walmart     559     1 walmart 5944912 walmartcar…         1962 retail  
##  2  2021 Walmart     559     1 walmart 2637561 <NA>                  NA retail  
##  3  2021 Amazon      386     2 amazon  4306268 amazon.com          1994 internet
##  4  2021 Amazon      386     2 amazon  6535892 <NA>                  NA warehou…
##  5  2021 Apple       274     3 apple   5735407 apple.com           1976 consume…
##  6  2021 Apple       274     3 apple   2389734 dorado.com          1998 compute…
##  7  2021 Apple       274     3 apple   5303562 <NA>                  NA consume…
##  8  2021 Apple       274     3 apple   4507089 intelpost.…           NA industr…
##  9  2021 CVS         268     4 cvs     5614397 <NA>                  NA retail  
## 10  2021 CVS         268     4 cvs     4908121 <NA>                  NA retail  
## # … with 19,529 more rows, and 13 more variables: size.range <chr>,
## #   locality <chr>, country <chr>, linkedin.url <chr>,
## #   current.employee.estimate <dbl>, total.employee.estimate <dbl>,
## #   Industry <chr>, Sector <chr>, incumbent <chr>, on_for_twenty <chr>,
## #   IT <chr>, current_fortune <chr>, startup <chr>
fortune_joined_v3 %>%
  filter(on_for_twenty == "True")%>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_col() +
  facet_wrap(~ Sector) +
  labs(title = "Average Revenue for 20+ Fortune 500 Companies from 1985 to 2021", 
       subtitle = "20+ -> Companies which have been on the Fortune 500 for more than 20 years") +
  ylab("Revenue (in billions)")

IT:

Average revenue for Information Technology Fortune 500 Companies

IT - Companies in the Information Technology stock market sector. I have chosen to view IT individually because the overarching analysis is focused on analyzing the effects of Technological changes on Fortune 500 companies, so it makes sense to pay close attention to this sector in particular.

Interpretation:

  • This graph provides a closer look at the Information Technology sector.

  • The IT sector shows non-decreasing growth year on year.

fortune_joined_v3 %>%
  filter(IT == "True" )%>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  filter(Rank < 20) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_smooth() +
  labs(title = "Average Revenue for IT Fortune 500 Companies from 1985 to 2021", 
       subtitle = "IT -> Information Technology Sector") +
  ylab("Revenue (in billions)") 

Current Fortune 500 Companies:

Average Revenue for Fortune 500 Companies in 2021

Current Fortune 500 Companies - Companies that are currently on the Fortune 500 in 2021.

Interpretation:

  • Current Fortune 500 Companies show the history of the most successful corporations in the world to date.

  • Sectors like Industrials, Materials, and Utilities have struggled in recent years, while Consumer Staples, IT, Financials, and Healthcare have experienced massive growth. This is indicative of a shift in the makeup of the Fortune 500.

fortune_joined_v3 %>%
  filter(current_fortune == "True")%>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_col() +
  facet_wrap(~ Sector) +
  labs(title = "Average Revenue for Current Fortune 500 Companies from 1985 to 2021", 
       subtitle = "Displays distribution of Revenue, by sector") +
  ylab("Revenue (in billions)")

Startup Companies:

Average Revenue for Fortune 500 Startups since 2000

Startup - A company that has been founded after the year 2000, and is on the Fortune 500 in 2021.

Interpretation:

  • This is a subset of Current Fortune 500 Companies.

  • It isn’t surprising that Startup companies on average have experienced little growth on the Fortune 500 over the years, because Startups may often take much longer than 20 years to become massively successful, and a majority do fail.

fortune_joined_v3 %>%
  filter(startup == "True")%>%
  filter(Year > 1999) %>%
  group_by(Sector, Year) %>%
  mutate(count = n()) %>%
  ggplot(aes(Year, Revenue/count)) +
  geom_col() +
  facet_wrap(~ Sector) +
  labs(title = "Average Revenue for Fortune 500 Startups from 2000 to 2021", 
       subtitle = "Displays distribution of Revenue, by sector") +
  ylab("Revenue (in billions)")

Part 3: Clarifying Assumptions/Goals

Overarching Question: How did the rise in technology usage (social media, internet, etc..) affect historically successful businesses/industries?

Parts 1 and 2 of this Analysis serve as the foundation of the predictive half. In the second half of this analysis, I hope to use the dataset I have built to answer the overarching question.

The goal of this analysis is to determine the impact of technological advancements on “successful” businesses. Therefore, the second half of the analysis will cover the following steps:

  • Part 4: Quantifying Success: What subset of Fortune 500 companies are successful? unsuccessful?

  • Part 5: Evaluating Success:

  • Part 6: Technological Influences: How has technology affected successful businesses? What does this mean for future investments?

Part 4: Quantifying Success

The goal for this section is to narrow in on a specific selection of successful companies. In the Boolean select portion of the analysis, I visualized the general trends within the different subsets of the Fortune 500. This was the first step towards determining what constitutes a successful company. Now, I will find several companies within each subset of the Fortune 500 that model success.

Important Qualities:

  • Non-decreasing Revenue per 5 year period (to account for select poor years)

  • Currently on Fortune 500 in 2021

  • Subsets:

    • Information Technology Companies

    • Healthcare Companies

    • Communication Services Companies

    • Consumer Discretionary Companies

    • Consumer Staples Companies

    • Energy Companies

    • Financial Companies

    • Healthcare Companies

    • Industrial Companies

    • Information Technology Companies

    • Materials Companies

    • Real Estate Companies

    • Each of these subsets will contain examples of both successful and unsuccessful companies

Procedure:

  1. Determine specific filters for success.
  2. Determine specific filters for lack of success.
  3. Modify Fortune 500 Dataset to hold Boolean values representing containment in the aforementioned subsets.

Filtering for Success

# add max and min revenue value for each company
fortune_joined_v3 <- fortune_joined_v3 %>%
  group_by(Company) %>%
  filter(Year > 1985) %>%
  mutate(max_revenue = max(Revenue, na.rm = TRUE),
         min_revenue = min(Revenue, na.rm = TRUE),
         revenue_diff = max_revenue - min_revenue,
         revenue_delta = revenue_diff/(max(Year, na.rm = TRUE) - min(Year, na.rm = TRUE)),
         years_on_fortune = max(Year, na.rm = TRUE) - min(Year, na.rm = TRUE))

# max_revenue -> maximum revenue during timeframe on fortune 500
# min_revenue -> minimum revenue during timeframe on fortune 500
# revenue diff -> change in revenue while on fortune 500
# revenue delta -> change in revenue (in billions/year) while on fortune 500

# create hyper-growth company dataset (<5 billion/year revenue growth, on average)
hyper_growthdf <- fortune_joined_v3 %>% 
  filter(current_fortune == "True") %>% # company must be on Fortune 500 in 2021
  group_by(Company) %>%
  filter(years_on_fortune > 10) %>% # need at least 10 years of data for confidence, down to 262 companies
  filter(!Sector %in% c("Other", "NA")) %>%
  filter(revenue_delta > 5)
  
# create moderate-growth company dataset ( >1 && <5 billion/year revenue growth, on average)
moderate_growthdf <- fortune_joined_v3 %>% 
  filter(current_fortune == "True") %>% # company must be on Fortune 500 in 2021
  group_by(Company) %>%
  filter(years_on_fortune > 10) %>% # need at least 10 years of data for confidence
  filter(!Sector %in% c("Other", "NA")) %>%
  filter(revenue_delta > 1 && revenue_delta < 5)

# create low-growth company dataset (<1 billion/year revenue growth, on average)
low_growthdf <- fortune_joined_v3 %>% 
  filter(current_fortune == "True") %>% # company must be on Fortune 500 in 
  group_by(Company) %>%
  filter(years_on_fortune > 10) %>% # need at least 10 years of data for confidence
  filter(!Sector %in% c("Other", "NA")) %>%
  filter(revenue_delta < 1 )


# add boolean value for hyper-growth to fortune 500 dataset
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(hyper_growth = ifelse(Company %in% hyper_growthdf$Company, "True", "False"))

# add boolean value for moderate-growth to fortune 500 dataset
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(moderate_growth = ifelse(Company %in% moderate_growthdf$Company, "True", "False"))

# add boolean value for low-growth to fortune 500 dataset
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(low_growth = ifelse(Company %in% low_growthdf$Company, "True", "False"))

Summary:

  • Added columns containing info on low, moderate, and high growth. These will be the Boolean conditions for “success”.

Filtering for Unsuccessful Companies

# create unsuccessful company dataset (<.5 billion/year revenue growth, on average) + not currently on Fortune 500 + between 10 and 20 years on list
unsuccessful_df <- fortune_joined_v3 %>% 
  filter(current_fortune == "False") %>% # company must be on Fortune 500 in 2021
  group_by(Company) %>%
  filter(years_on_fortune > 10 && years_on_fortune < 20) %>% # need at least 10 years of data for confidence, but less than 20 years, 
  filter(!Sector %in% c("Other", "NA")) %>%
  filter(revenue_delta < .5)

# add boolean value for unsuccessful to fortune 500 dataset
fortune_joined_v3 <- fortune_joined_v3 %>%
  mutate(unsuccessful = ifelse(Company %in% unsuccessful_df$Company, "True", "False"))

Visualizing Success

We now have Boolean values that can be used to filter down to various successful/unsuccessful companies.

Hyper Growth Companies

These companies are the most successful of the bunch. Because Hyper-Growth will remain the main focus of the rest of the analysis, it necessary to identify the makeup of the companies. The following plots serve that purpose.

Results:

# plotting all names
fortune_joined_v3 %>%
  filter(hyper_growth == "True") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col()+
  facet_wrap(~ Name)

# plotting all names, by sector
fortune_joined_v3 %>%
  filter(hyper_growth == "True") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col()+
  facet_wrap(~ Sector)

# plotting IT companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Information Technology") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

# plotting Consumer Staples companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Consumer Staples") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

# plotting Healthcare companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

# plotting Financials Companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

# plotting Energy companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Energy") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

# plotting Industrials companies
fortune_joined_v3 %>%
  filter(hyper_growth == "True", Sector == "Industrials") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes(fill = Name))

Moderate Growth Companies

As you can see, even at the moderate growth tier, it already becomes counterproductive to visualize all the companies.

# plotting all names
fortune_joined_v3 %>%
  filter(moderate_growth == "True") %>%
  ggplot(aes(Company, Revenue)) +
  geom_col() + coord_flip()

It is more pragmatic to simply focus these companies on a sector-by-sector basis.

# plotting all companies, by sector
fortune_joined_v3 %>%
  filter(moderate_growth == "True") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col()+
  facet_wrap(~ Sector)

# plotting IT companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Information Technology") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Consumer Staples companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Consumer Staples") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Healthcare companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Financials Companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Energy companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Energy") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Industrials companies
fortune_joined_v3 %>%
  filter(moderate_growth == "True", Sector == "Industrials") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

Low Growth Companies

It is more pragmatic to simply focus these companies on a sector-by-sector basis.

# plotting all companies, by sector
fortune_joined_v3 %>%
  filter(low_growth == "True") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col()+
  facet_wrap(~ Sector)

# plotting IT companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Information Technology") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Consumer Staples companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Consumer Staples") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Healthcare companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Financials Companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Energy companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Energy") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Industrials companies
fortune_joined_v3 %>%
  filter(low_growth == "True", Sector == "Industrials") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

Unsuccessful Companies

# plotting all names
fortune_joined_v3 %>%
  filter(unsuccessful == "True") %>%
  ggplot(aes(Company, Revenue)) +
  geom_col() + coord_flip()

It is more pragmatic to simply focus these companies on a sector-by-sector basis.

# plotting all companies, by sector
fortune_joined_v3 %>%
  filter(unsuccessful == "True") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col()+
  facet_wrap(~ Sector)

# plotting IT companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Information Technology") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Consumer Staples companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Consumer Staples") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Healthcare companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Financials Companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Healthcare") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Energy companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Energy") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

# plotting Industrials companies
fortune_joined_v3 %>%
  filter(unsuccessful == "True", Sector == "Industrials") %>%
  ggplot(aes(Year, Revenue)) +
  geom_col(aes()) +
  facet_wrap(~ Name)

Part 5: Technological Influences

In this section, I will start by merging the datasets containing the following information into fortune_joined_v3:

  1. Technology Adoption -> technology_adoption_by_households_in_the_united_states
  2. Supercomputer Power -> supercomputer_power_flops
  3. Computing Efficiency -> computing_efficiency
  4. Internet Usage -> percent_using_internet
  5. Phone Usage -> landline_cellular_data
  6. Moore’s Law -> moores_law
  7. Computing Cost -> computer_storage_cost
Dataset Source Description
Technology Adoption https://ourworldindata.org/grapher/technology-adoption-by-households-in-the-united-states This dataset details the rates of diffusion and adoption of a range of technologies in the United States, measured as the percentage of US households with access or adoption over time.
Supercomputer Power https://ourworldindata.org/grapher/supercomputer-power-flops Number of floating-point operations carried out per second by the largest supercomputer in any given year.
Computing Efficiency https://ourworldindata.org/grapher/computing-efficiency Computer processing efficiency, measured as the number of watts needed per million instructions per second (Watts
per MIPS).
Internet Usage https://data.worldbank.org/indicator/IT.NET.USER.ZS Percent of US population using the internet by year
Phone Usage https://data.worldbank.org/indicator/IT.CEL.SETS Data on Cellular and Land line subscriptions in the US
Moore’s Law https://ourworldindata.org/technological-change Real world data on Moore’s law, which states the number of transistors on a microchip doubles about every two years, though the cost of computers is halved.
Computing Cost https://ourworldindata.org/technological-change Historical cost of computer memory and storage, measured in US dollars per megabyte.

Joining the Dataframes

# add external dataframes
fortune_joined_v4 <- fortune_joined_v3 %>%
  left_join(., supercomputer_power_flops, by = "Year") %>%
  left_join(., computing_efficiency, by = "Year") %>%
  left_join(., percent_using_internet, by = "Year") %>%
  left_join(., landline_cellular_data, by = "Year") %>%
  left_join(., moores_law, by = "Year") %>%
  left_join(., computer_storage_cost, by = "Year")

We now have another data frame called fortune_joined_v4, which contains all of the external technological factor data. I will use this data to evaluate the effects of each on different sectors.

I will now plot the technological factors to determine which ones will be most useful for the next part of the analysis.

fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, -log10(`Computing efficiency`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency", 
       subtitle = "Computing Efficiency vs. Revenue from highly Successful Companies") +
  ylab("Computing Efficiency")

Analysis

Hyper-Successful Companies vs. Technological Improvements

The primary goal of this analysis was to determine what technology factors affect successful companies. Therefore, the relationship between hyper-successful company revenue and technology factors is doubly necessary. The following sections comparing lesser successful companies and unsuccessful companies is aimed at reducing negative effects from confounding variables.

What is the relationship between hyper-successful companies and Technological Improvements?

# computing efficiency
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, -log10(`Computing efficiency`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency Time Series", 
       subtitle = "Computing Efficiency vs. Revenue from Hyper-Successful Companies") +
  ylab("Computing Efficiency")

# supercomputer power
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Floating-Point Operations per Second`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Supercomputer Power Time Series", 
       subtitle = "Floating-Point Operations/s vs. Revenue from Hyper-Successful Companies") +
  ylab("Floating-Point Operations/s") +
  xlab("Revenue (in bilions)")

# computing efficiency
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), -`Computing efficiency`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency", 
       subtitle = "Watts per MIPS vs. Revenue from Hyper-Successful Companies") +
  ylab("Watts per MIPS") +
  xlab("Revenue (in billions)")

# percent of US using the internet
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), `Individuals using the Internet (% of population)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Internet Usage Time Series", 
       subtitle = "Internet Usage vs. Revenue from Hyper-Successful Companies") +
  ylab("Percentage of US Population Using Internet") +
  xlab("Revenue (log scale)")

# Cellular Subscriptions
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Cellular Subscriptions`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Cellular Subscriptions Time Series", 
       subtitle = "Number of Cellular Subscriptions vs. Revenue from Hyper-Successful Companies") +
  ylab("Number of US Cellular Subscriptions") +
  xlab("Revenue (in billions)")

# landline subscriptions (per 100 people)
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Landline Subscriptions (per 100 people)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Landline Subscriptions Time Series", 
       subtitle = "Percent of US using Landline vs. Revenue from Hyper-Successful Companies") +
  ylab("Percent of Landline Subscriptions") +
  xlab("Revenue (in billions)")

# moores law
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Transistors per microprocessor`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Mooore's Law Time Series", 
       subtitle = "Transistors per Microprocessor vs. Revenue from Hyper-Successful Companies") +
  ylab("Transistors per Microprocessor (log scale)") +
  xlab("Revenue (in billions)")

# computing cost (memory)
fortune_joined_v4 %>%
  filter(hyper_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, memory)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Cost Time Series", 
       subtitle = "Memory Cost vs. Revenue from Hyper-Successful Companies") +
  ylab("Dollars per Megabyte") +
  xlab("Revenue (in billions)")

Moderately-Successful Companies vs. Technological Improvements

# computing efficiency
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, -log10(`Computing efficiency`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency Time Series", 
       subtitle = "Computing Efficiency vs. Revenue from Moderately-Successful Companies") +
  ylab("Computing Efficiency")

# supercomputer power
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Floating-Point Operations per Second`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Supercomputer Power Time Series", 
       subtitle = "Floating-Point Operations/s vs. Revenue from Moderately-Successful Companies") +
  ylab("Floating-Point Operations/s") +
  xlab("Revenue (in bilions)")

# computing efficiency
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), -`Computing efficiency`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency", 
       subtitle = "Watts per MIPS vs. Revenue from Moderately-Successful Companies") +
  ylab("Watts per MIPS") +
  xlab("Revenue (in billions)")

# percent of US using the internet
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), `Individuals using the Internet (% of population)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Internet Usage Time Series", 
       subtitle = "Internet Usage vs. Revenue from Moderately-Successful Companies") +
  ylab("Percentage of US Population Using Internet") +
  xlab("Revenue (log scale)")

# Cellular Subscriptions
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Cellular Subscriptions`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Cellular Subscriptions Time Series", 
       subtitle = "Number of Cellular Subscriptions vs. Revenue from Moderately-Successful Companies") +
  ylab("Number of US Cellular Subscriptions") +
  xlab("Revenue (in billions)")

# landline subscriptions (per 100 people)
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Landline Subscriptions (per 100 people)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Landline Subscriptions Time Series", 
       subtitle = "Percent of US using Landline vs. Revenue from Moderately-Successful Companies") +
  ylab("Percent of Landline Subscriptions") +
  xlab("Revenue (in billions)")

# moores law
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Transistors per microprocessor`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Mooore's Law Time Series", 
       subtitle = "Transistors per Microprocessor vs. Revenue from Moderately-Successful Companies") +
  ylab("Transistors per Microprocessor (log scale)") +
  xlab("Revenue (in billions)")

# computing cost (memory)
fortune_joined_v4 %>%
  filter(moderate_growth == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, memory)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Cost Time Series", 
       subtitle = "Memory Cost vs. Revenue from Moderately-Successful Companies") +
  ylab("Dollars per Megabyte") +
  xlab("Revenue (in billions)")

Unsuccessful Companies vs. Technological Improvements

# computing efficiency
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, -log10(`Computing efficiency`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency Time Series", 
       subtitle = "Computing Efficiency vs. Revenue from Less-Successful Companies") +
  ylab("Computing Efficiency")

# supercomputer power
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Floating-Point Operations per Second`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Supercomputer Power Time Series", 
       subtitle = "Floating-Point Operations/s vs. Revenue from Less-Successful Companies") +
  ylab("Floating-Point Operations/s") +
  xlab("Revenue (in bilions)")

# computing efficiency
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), -`Computing efficiency`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Efficiency", 
       subtitle = "Watts per MIPS vs. Revenue from Less-Successful Companies") +
  ylab("Watts per MIPS") +
  xlab("Revenue (in billions)")

# percent of US using the internet
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(log10(Revenue), `Individuals using the Internet (% of population)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Internet Usage Time Series", 
       subtitle = "Internet Usage vs. Revenue from Less-Successful Companies") +
  ylab("Percentage of US Population Using Internet") +
  xlab("Revenue (log scale)")

# Cellular Subscriptions
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Cellular Subscriptions`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Cellular Subscriptions Time Series", 
       subtitle = "Number of Cellular Subscriptions vs. Revenue from Less-Successful Companies") +
  ylab("Number of US Cellular Subscriptions") +
  xlab("Revenue (in billions)")

# landline subscriptions (per 100 people)
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, `Landline Subscriptions (per 100 people)`)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "US Landline Subscriptions Time Series", 
       subtitle = "Percent of US using Landline vs. Revenue from Less-Successful Companies") +
  ylab("Percent of Landline Subscriptions") +
  xlab("Revenue (in billions)")

# moores law
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, log10(`Transistors per microprocessor`))) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Mooore's Law Time Series", 
       subtitle = "Transistors per Microprocessor vs. Revenue from Less-Successful Companies") +
  ylab("Transistors per Microprocessor (log scale)") +
  xlab("Revenue (in billions)")

# computing cost (memory)
fortune_joined_v4 %>%
  filter(unsuccessful == "True") %>%
  group_by(Year) %>%
  ggplot(aes(Revenue, memory)) +
  geom_smooth() +
  stat_cor() +
  labs(title = "Computing Cost Time Series", 
       subtitle = "Memory Cost vs. Revenue from Less-Successful Companies") +
  ylab("Dollars per Megabyte") +
  xlab("Revenue (in billions)")

Correlation Results

Correlation Values
Computing Efficiency Supercomputer Power Watts per MIBS % Using Internet Cellular Landline Moore’s Law Computing Cost
Hyper-Successful Companies 0.36 0.59 0.32 0.72 0.58 -0.53 0.53 -0.18
Moderately Successful Companies 0.14 0.29 0.11 0.34 0.29 -0.23 0.25 -0.14
Unsuccessful Companies 0.36 0.31 0.28 0.49 0.4 -0.016 0.35 -0.33

Interpretation:

  • We can eliminate the following categories due to low overall correlation: Computing Cost, Watts Per MIBS, Computing Efficiency

  • Supercomputer power shows significantly higher correlation to hyper-successful company growth compared to growth of even moderately successful companies. This suggest that Supercomputer power could be a technological advancement that highly successful companies have taken advantage of properly.

  • Percent of US individuals Using the Internet also has much higher correlation to Hyper-Successful Companies.

  • Cellular has a strong positive correlation with Hyper-Successful companies relative to the otehr categories.

  • Landline is extremely negatively correlated, so disproportionately so that it may be worthwhile investigating why this could be the case.

  • Moore’s law shows promisng correlation with Hyper-Successful companies as well.

Part 6: Summary of Results

esquisse::esquisser(viewer = “browser”)